Montano ef al. Genome Biology 2013, 14:R94 
http://genomebiology.eom/content/14/8/R94 



^ Genome Biology 



METHOD 



Open Access 



Measuring cell-type specific differential 
methylation in human brain tissue 



Carolina M Montano 1,2 , Rafael A Irizarry 3 , Walter E Kaufmann 4 , Konrad Talbot 5 , Raquel E Gur 5 , Andrew P Feinberg e 
and Margaret A Taub 7 * 



Abstract 

The behavior of epigenetic mechanisms in the brain is obscured by tissue heterogeneity and disease-related 
histological changes. Not accounting for these confounders leads to biased results. We develop a statistical 
methodology that estimates and adjusts for celltype composition by decomposing neuronal and non-neuronal 
differential signal. This method provides a conceptual framework for deconvolving heterogeneous epigenetic data 
from postmortem brain studies. We apply it to find cell-specific differentially methylated regions between 
prefrontal cortex and hippocampus. We demonstrate the utility of the method on both Infinium 450k and CHARM 
data. 
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Background 

The brain is a particularly good example of highly spe- 
cialized and diverse functions arising from the same 
genetic program. Epigenetic mechanisms copy informa- 
tion other than the sequence itself during cell division, 
such as DNA methylation and chromatin arrangements 
[1]. Therefore, epigenetics is an attractive substrate for 
understanding specialized brain function and its disrup- 
tion in disease. An example of an epigenetic mechanism 
is DNA methylation, which at CpG dinucleotides is 
heritable during cell division, because that sequence is 
recognized by a DNA methyltransferase on newly repli- 
cated strands. In post-mitotic cells such as neurons, 
DNA methylation has been shown to contribute to 
memory formation [2], other types of synaptic plasticity 
[3], drug addiction [4], and reversible behavior in the 
honeybee Apis mellifera [5]. Neurological diseases have 
also been linked to mutations in DNA methyltrans- 
ferases [6] and methyl-CpG-binding proteins [7]. 

Despite its importance, the epigenetic profile of the 
brain has not yet been explored in depth due to, among 
other factors, brain region and cell-type heterogeneity. 
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The cerebral cortex has distinct functional regions, each 
organized into cell layers of neurons and glia that vary 
throughout the cortex [8] . While neurons are the main sig- 
naling unit, glia play an important role in scaffolding and 
maintaining synapses [9]. Epigenetic profiling of neurons 
and non-neurons using the Illumina GoldenGate assay has 
shown that neurons and glia have a unique DNA methyla- 
tion signature that cannot be assessed using samples from 
bulk cortex [10]. This is important because shifts in glial 
cell populations such as oligodendrocytes contribute to 
defects in cortical myelination, and microglia activation has 
been linked to neurodegenerative disorders [11]. 

Traditional epidemiological studies using brain tissue 
done so far do not account for differences in cell-type 
composition [12-14]. Statistical methods for estimating 
cell-type composition from genomic profiles have been 
developed for gene expression [15-18], and DNA methyla- 
tion in blood tissue [19] and in brain [20]. DNA methyla- 
tion can then be used to calculate and potentially adjust 
for differing cell proportions, a crucial step when studying 
diseases where cell population shifts occur [21]. 

While DNA methylation data can now be used to cal- 
culate differing cell proportions, individual cell-type pro- 
filing has not been done yet due to the extensive mixture 
combinations required for validation in blood (at least 
five different cell types) [19]. In contrast, cell profiling in 
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the brain can be achieved by separating the cell types into 
two main compartments: neurons and glia. In a recent 
publication [20], a method is proposed for estimating 
neuron and glia proportions similar to the approach pro- 
posed for whole blood [19]. While this is a useful step 
toward correcting for cell distribution, this approach 
does not permit the unbiased estimation of glia- and neu- 
ron-specific differences between two sets of samples [20]. 
Such calculated cell-type specific analysis offers a crucial 
advantage in studies of the brain, where neurons and glia 
cannot generally be dissociated. For example, many brain 
bank specimens contain pulverized material or even par- 
affin-fixed specimens, for which methods exist to isolate 
DNA for genome-scale methylation analysis [22]. Flow 
sorting, as done here to develop this method, generally 
does not yield sufficient quantities of material for gen- 
ome-scale analysis, and is also extremely labor intensive 
and costly. 

Here we have developed a novel statistical epigenetics 
approach that takes advantage of the stability and cell- 
type specificity of DNA methylation, as well as the fact 
that the brain is made up of two major cell types, neu- 
rons and glia, in order to deconvolve the two main cell 
components in the brain. Thus, the method allows one to 
measure DNA methylation, for example, across brain 
regions, and from those data calculate to a first approxi- 
mation the difference in DNA methylation that is neu- 
ron- or glia-specific. Moreover, once sorted data is 
available for a given brain region, investigators can use 
such data to calculate cell proportions on any unsorted 
sample measured on the same methylation platform 
without the need to sort themselves. This approach 
should have broad application to a range of problems in 
neurodevelopment and disease research. 

Results and discussion 

Estimation of mixture proportions 

We measured DNA methylation profiles for dorsolateral 
prefrontal cortex (DLPFC), hippocampal formation (HF), 
and superior temporal gyrus (STG) samples dissected 
from frozen brains of normal individuals using the com- 
prehensive high-throughput arrays for relative methylation 
(CHARM) technique [23]. We also labeled and separated 
neuronal nuclei in a subset of samples using a neuron- 
specific antibody (NeuN) and fluorescence-activated 
cell sorting (FACS) [24,25]. Neuronal (NeuN+) and non- 
neuronal (NeuN-) fractions from DLPFC, HF, and STG 
were collected for downstream processing and methyla- 
tion analysis with CHARM (Additional File 1, Figure SI). 

To illustrate the downstream effects of the cell popula- 
tion confounding problem, and focusing on two brain 
regions for clarity, we examined a genomic region for 
which: (1) no difference was observed between DLPFC 



and HF in either neuronal or glial fractions; and (2) a dif- 
ference was observed between neuronal and glial nuclei 
within each brain region (Figure la). Note that a strong 
methylation difference between brain regions is observed 
between the non-cell-sorted brain samples. This must be a 
false-positive and, as we demonstrate below, must be due 
to differences in cell-type composition between the brain 
regions. 

We modified a statistical method originally developed 
to estimate cell populations in blood [19] to calculate 
neuronal and glial proportions for each of our unsorted 
samples, adapting it to use a constrained linear optimiza- 
tion model (Figure lb, see overview in Additional File 1, 
Figure S2a). We confirmed that our approach effectively 
estimated these cell proportions using a mixture experi- 
ment with an independent set of samples (Additional File 
1, Figure S2b). To demonstrate that the false-positive 
results of Figure 1 are due to difference in cell-type distri- 
bution, we mathematically reconstructed the unsorted 
sample methylation profile using the pure neuronal and 
glial profiles and their estimated frequencies and pre- 
dicted this result (Additional File 1, Figure S2c). 

While the above results rely on having neuronal and 
glial methylation signals for each brain region, we per- 
formed additional analyses to determine whether accurate 
estimates of neuronal and glial proportions in unsorted 
samples from a brain region could be obtained using 
selected data from another brain region. Figure lc shows 
the accuracy of estimates obtained from such 'universal' 
data, compared to estimates based on sorted data from 
each individual brain region. We also accurately reproduce 
the cell proportion estimates from our mixture experiment 
(Additional File 1, Figure S2d, see Materials and Methods 
for additional details of how this analysis was performed). 
Our results indicate that accurate estimates could be 
obtained for a new brain region without the need to sort 
samples from that region. 

Generative model of methylation signal 

Currently, obtaining cell-type specific DMRs from 
unsorted samples is a mathematically intractable problem. 
However, because in human postmortem brain samples 
we are interested in just two cell fractions (neurons and 
glia), we were able to develop a novel statistical procedure 
to perform this deconvolution. The methylation signal for 
any sample i at a given genomic location, Y it can be mod- 
eled as a linear combination of the methylation levels of 
neuronal and glial fractions in the brain region where the 
sample i was obtained. Specifically, for any given CpG, the 
DNAm profile of a mixed sample can then be written as 
(see Materials and Methods): 

Yi = HD,+ + (iJ-D,- ~ MD,+) + (HH,+ ~ MD,+) Xi (1 - m) + (llH,- - IJ-D,-) XjJTj + £j 
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Location DLPFC HF NeuN+ estimated using universal data 



Figure 1 The proportion of neuronal cells in a given brain region influences the identification of differentially methylated regions. 

(a) Whole-tissue methylation signals show false-positive brain-region differences. Panel shows a plot of smoothed methylation signals from 
sorted neuronal and glial cells (teal and purple lines) from DLPFC and HF (solid and dashed lines) as well as from whole-tissue DLPFC (gold line) 
and HF (grey line), (b) Estimated neuronal fraction of cells for whole-tissue samples differs between DLPFC and HF (mean DLPFC = 0.53 (n = 19), 
mean HF = 0.30 (n = 13), two-sample t-test P value 6.3 x 10~ 6 ). (c) Estimated neuronal fraction of cells for whole-tissue samples using universal 
DMRs vs. estimated neuronal fraction using brain region-specific DMRs from DLPFC (gold), HF (grey), and STG (blue). 



Here, we define Md,+ and Md,- to be the methylation 
level of neuronal and glial fractions, respectively, in 
DLPFC, with Mh,+ and Mh,- defined similarly for HF. For 
each sample i, X, is 1 if sample j was obtained from HF 
and 0 for DLPFC samples. We let JTj to be the fraction of 
glia in sample i, so that 1 — 7T t - is the fraction of neurons. 
Finally, £i represents biological variability and measure- 
ment error. The statistical insight is that because the term 
JT; can be estimated with high precision (Additional File 1, 
Figure S2c), it can be treated as fixed. With this assump- 
tion in place, the equation above is actually a linear model 
of the form 

Yi = p 0 + pun + p 2 Xi{l - + PiXtJi, + Sil 

in which the parameters Pi and P$ represent the quanti- 
ties we are interested in measuring, that is, the differences 
in neurons and glia, respectively, between brain regions. 
We refer to this model as M2. Fitting this linear model by 
least squares and obtaining estimates for millions of geno- 
mic locations is computationally feasible. (Fitting the 
model for 4 million probes took about 5 seconds on our 
laptop). 

This statistical framework also exposes the problem with 
existing naive approaches to assess DNA methylation sig- 
natures in mixed samples. To date, most published ana- 
lyses ignore cell composition [26-30] and look for 
associations in a way equivalent to fitting a simple linear 
regression model Yj = uo + diXj + £j (where the t-test is 
derived from the X; = 0 or 1). We refer to this model as 
Ml. In Ml, the parameter oil represents a combination of 
the methylation differences in neurons and glia in which it 
is impossible to deconvolve cell-type-specific contributions. 
Furthermore, we can mathematically demonstrate that the 
least squares estimate of &\ will be biased by differences in 



cell-type frequency under the null hypothesis of no differ- 
ence in methylation between brain regions (Figure 2a, see 
Methods Section). Similarly, a naive model suggested by 
Guintivano et al. [20] that incorporates cell-type propor- 
tions Yi = Yo + KiX + Yi n i + £ i ( we refer to this as model 
M3) will lead to biased results as well, and to decreased 
power to detect methylation differences (Additional File 1, 
Figure S3). We also note that even the superior methods 
show a small amount of bias (boxplot not centered at 0), 
which can be explained by slightly inaccurate mixture esti- 
mates (see Materials and Methods). 

To test the utility of our model, we confirmed our theo- 
retical results with experimental data. First, we obtained 
estimates of significant neuron-specific methylation differ- 
ences between DLPFC and HF using sorted brain samples 
(gold standard, FDR <0.05, Additional File 1, Table SI). 
We then used the unsorted brain data to calculate the 
parameters representing the differences in brain-region 
methylation using models Ml (total methylation differ- 
ence, <*i) and M2(neuron-specific methylation difference, 
Pi). Figure 2b shows that we can estimate neuron-specific 
methylation differences more accurately with model M2. 
Therefore, we can assess neuron-specific methylation dif- 
ferences between DLPFC and HF using whole tissue after 
estimating cell proportions. 

Using the sorted samples, we did not find statistically sig- 
nificant DMRs in the non-neuronal fraction, which high- 
lights the importance of isolating a neuronal signal from 
total methylation values. The result is in agreement with 
recently published literature suggesting that glia cells, con- 
tained in the NeuN- fraction, have less diverse transcrip- 
tion patterns across brain regions than neurons [31], the 
latter having a distinct DNA-methylation signature [10]. 
Interestingly, proteins involved in modifying chromatin 
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Figure 2 Effects of direct modeling on false-positives and accuracy, (a) Explicit modeling for differences in cell type reduces false-positive rate. 
Boxplots of test statistics for the difference in means based on linear regression estimation from models M1 and M2. Eighty percent of regions from 
M1 show a statistically significant difference in overall mean (at level 0.05); 16% and 12% of regions from M2 show a statistically significant difference 
in neurons or glia, respectively (at level 0.05). (b) Explicit modeling of neuronal methylation differences improves estimation accuracy. Comparison of 
gold-standard mean difference in methylation in neuron-specific DMRs to the estimated mean difference from models Ml (left) and M2 (right), along 
with the linear regression fit to the data (95% CI for the slope of the regression line of M1 = (0.29, 0.44), for M2 = (0.68, 0.95). 



were found among the brain-region neuronal DMRs, sup- 
porting the role of epigenetic mechanisms in neuronal 
function and synaptic plasticity [32]. For example, neuron- 
specific methylation of the histone methyltransferase 
SETD3, which methylates histone H3 at lysine 36, was 
lower in HF than in DLPFC, and histone deacetylase 
HDAC4 shows hypomethylation in DLPFC. Other genes 
involved in neural differentiation include JAG1, TTL1, 
NPAS4, CUX-2, DOCK2, NGEF, OLFM1, SATB2, and 
GIT2. 

Application to lllumina Infinium HumanMethyation450 
Dataset 

While the CHARM platform has many advantages for 
studying methylation patterns due to the high density and 
location of probes, the assay requires restriction-enzyme 
digestion and lacks single-base resolution. The lllumina 
Infinium HumanMethylation450 (450K) array has emerged 
as an affordable alternative to obtain reliable quantitative 
measurements of methylation. To demonstrate the perfor- 
mance of our method on data from the 450K array, we 
used data accessible at NCBI GEO database (Guintivano et 
al. [20], accession GSE41826), consisting of 77 normal 
samples from prefrontal cortex, of which 29 were sorted 
into neuronal and glial fractions, nine were mixtures of 
neurons and glia of known proportions, and 10 were 
unsorted, whole-tissue samples. We first applied our 
method to obtain accurate cell-fraction estimates on the 
known mixture samples (Additional File 1, Figure S4a). 
Using these cell-fraction estimates and the pure neuronal 
and glial profiles, we mathematically reconstructed the 
methylation profile for the mixture samples in a set of 
genomic regions and compared these results to the actual 
observed methylation for these samples (Additional File 1, 



Figure S4b). The cell proportion calculations agreed with 
Guintivano et al.'s estimates for prefrontal cortex. Our 
CHARM cell proportion estimates are on average higher 
than those obtained using 450K arrays, as the CHARM 
data were sampled using 2 mm dermal biopsy punches to 
minimize white matter contamination. The mathematical 
reconstruction of the methylation signal was also done for 
the unsorted samples (Additional File 1, Figure S4c). 

Given that sorted data on the 450K array are only avail- 
able for one brain region, we cannot demonstrate our 
improved ability to detect true brain-region differences in 
cell-type specific methylation on this platform. However, 
to show our ability to reduce false-positive signal, we 
constructed an artificial comparison by grouping the 
mixture samples with the highest and lowest neuronal 
fractions and applied models Ml and M2 to look for dif- 
ferences between these two groups. Any such differences 
are clearly due only to cell-fraction variation, and model 
M2 reduces the number of false-positive signals (Addi- 
tional File 1, Figure S4d), as we saw for our CHARM data 
(Figure 2a). These results indicate that our methods apply 
well to data from the 450K array. 

Conclusions 

We describe an algorithm to address a gap in the analysis 
of methylation data from complex tissues with varying 
degrees of cell-type heterogeneity such as the brain. To 
appropriately measure the methylation differences 
between two brain cortical regions, we separated a small 
number of samples of the brain nuclei into neuronal and 
non-neuronal fractions by cell sorting, and developed a 
statistical method to account for cell heterogeneity in a set 
of unsorted samples by decomposing the signal into its 
two components. Our proposed method takes advantage 
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of the separation of the brain cells into two nuclei frac- 
tions. The neuronal fraction encompasses a diverse popu- 
lation of neuronal cells, and the non-neuronal nuclei 
contain astrocytes, oligodendrocytes, a minority of NeuN- 
negative neurons, and endothelial cells. To separate the 
methylation signal into more than two fractions is mathe- 
matically plausible, as one can simply define Hi as the frac- 
tion of cells of the cell-type of interest, fit model M2, and 
consider fis- However, investigating how robust our results 
are to the noise in cell fraction estimates when there are 
more than two cell types will require further study. 

The experimental design presented here provides for effi- 
cient use of scarce tissue bank resources and limited funds 
for methylation profiling. Once purified methylation pro- 
files are obtained from the brain regions of interest using a 
small number of samples, the gold-standard methylation 
data can be used for any further analysis, and by any 
laboratory, without the need to sort nuclei again. We have 
demonstrated our method on data from both CHARM and 
the Illumina 450K array. To apply our method to a new 
measurement platform or new brain regions, we recom- 
mend performing cell sorting on a subset of the samples to 
first obtain the cell-type specific signals needed for the cell- 
fraction estimation. If brain-region specific data are not 
available, we have also shown that for samples measured 
with CHARM, accurate estimates of cell proportions in 
samples from one brain region could be obtained using 
sorted data from another brain region. We provide a fra- 
mework that can be applied, even retrospectively, to psy- 
chiatric case-control studies using frozen postmortem 
brain samples, and can be easily adapted to other microar- 
ray or sequencing platforms, and to other target tissues. 

Materials and methods 

Generative model of methylation signal 

To illustrate our model, we consider the case of estimating 
differences in methylation between DLPFC (D) and HF 
(H). We assume these brain tissues are composed of two 
cell types, NeuN+ (+) and NeuN- (-). For a fixed genomic 
position, we let fi),k be the methylation level in region /, 
j e {H, D} and cell type k, k e {+, -}. Scientifically, we are 
interested in identifying genomic locations where ^H,k - 
^D,k * 0, that is, where NeuN+ or NeuN- have different 
methylation levels in the two brain regions. 

Given a sample i and considering a fixed genomic posi- 
tion, we define X; as the indicator that sample i is from the 
hippocampus, that is, = 1 if sample i is from the hippo- 
campus and 0 otherwise. We also define TTj to be the frac- 
tion of sample i that consists of NeuN- cells (1 - it\ is the 
fraction of NeuN+ cells). We can then derive the expected 
value of the methylation signal of sample i at that genomic 
position as 



Rearranging terms gives: 

Suppose we wanted to estimate whether there is a dif- 
ference in methylation between the two brain regions 
being considered, H and D. If we fit a model with terms 
matching those above, that is, 

E{Yi) = p 0+ fim + p 2 Xi{l - m) + psXim (M2) 

then our estimated coefficients have interpretations 
equivalent to the generative model in Equation 1. Specifi- 
cally, we can test the hypothesis of no difference in 
NeuN+ methylation between D and H (^h,+ - ^d,+ = 0) 
by testing the hypothesis that f3 2 = 0, and the hypothesis 
of no difference in NeuN- methylation between D and H 
(^h,- - ^d,- = 0) by testing the hypothesis that /3 3 = 0. 

From the equations above, we can see that estimating 
the fraction of cells of each type, n v allows us to explicitly 
find locations with brain-region differences specific to 
NeuN+ or NeuN- cells. 

Naive models are biased 

In general, Ji\ is unknown and therefore not included in 
the linear model, that is, the model 

E(Yi) =a 0 + ai X t (Ml) 

is fitted. However, this model does not account for all 
the sources of variation in Y v and the least squares esti- 
mate a\ is a biased estimate of the difference in methyla- 
tion between H and D under the null hypothesis. To see 
this, we can write E(a) = (X^^X^Y), where X is the 
design matrix of the above model and &i is the vector 
(<So> oil) an d the hats represent least squares estimates. 
For simplicity, we assume equal numbers of samples 
from H and D. We then have 

£("i) = Mh,+ - Md,+ + (mh- - Mh,+)^h - (md,- - Md,+)^d 

Where JTj is the mean fraction of NeuN- cells in region 
/. Under the null hypothesis of no difference between 
D and H in either + or -, we have /xh, + — Md, + = 0 and 
also (/x H , - - Mh, + ) = (md, - - Md, + ) = <5, which gives 

E(&l) = S{JT H - 7T D ). 

This means that where + and - have different methyla- 
tion levels (<5 ^ 0), a difference in the fractions of + and - 
cells in the different brain regions can lead to false-positive 
signals of brain region differences in methylation. 

Guintivano et al. [20] estimate iti and propose an ad hoc 
approach to adjust for this that is approximated by fitting 
the following model 



E(Yi) = {jr^ D ,_ + (1 - jr,)w, + l(l - Xi) + fawt- + (1 - m)n H ,A^ 



E{ y i) = Yo + YiXi + YiTTi 



(M3) 
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However, this model does not account for all the 
sources of variation in Yi either and the least squares 
estimate pi is a biased estimate of the difference in 
methylation between H and D. To see this, we can write 
E(y) = {X t Xy l X t E{Y), where X is the design matrix of 
the above model and y is the vector [yo, y\, y^) and the 
hats represent least squares estimates. For simplicity, we 
assume equal numbers of samples from H and D. We 
then have 

E(9i) = Mh,+ - Md,+ + K((Mh - - Mh,+) - (Md - - Md,+)) 

Where K is a function of the Hi 's that does not depend 
on the sample size: 

+ ijT 2 - (jr) 2 ^ + ^7T 2 (^ - TT H ) 

K = j 

- (n 2 - (tth) 2 ) +jz(jr H -jr) 

With tz and jzh the average of the 's in all samples and 
H samples, respectively, and jz 1 and ir^ the average of the 
Ttf 's in all samples and H samples, respectively. Note that 
the bias is directly proportional to the difference between 
NeuN+ and NeuN- fractions, demonstrating that this 
approach is incapable of deconvolving these quantities of 
interest. 

Estimation of mixture proportions 

Although we have shown that fitting the mis-specified 
model, which does not include the cell-fraction terms, can 
lead to bias under the null hypothesis, the cell fractions for 
a given sample are unknown a priori. At any given methy- 
lation site, we are assuming that there is some underlying 
mean methylation value within each combination of cell 
type (+, -) and brain region (D, H). If we know these 
underlying means, we can derive an estimate of the 
unknown cell fraction at a particular site, given an 
observed methylation signal and assuming the generative 
model above. For example, suppose sample i is from 
D and we observe methylation signal Yj at a given locus. 
From Equation 1, we have 

E(Yi) = fj. Di+ + jri(/x Di _ - fj. Di+ ) = itiH, Di - + (1 - Jii)ii Di+ (2) 

If we assume Md,+ and Md,- are known, is the only 
unknown in this equation, so it can be estimated. Note 
that we do need to constrain our estimate of to be 
between 0 and 1. Also, the means fi are not known, so we 
collected data to allow us to estimate these means, by mea- 
suring methylation in pure cell sorted + or - fractions from 
each brain region of interest. Given that these methylation 
measurements have uncertainty, we want to reduce the 
uncertainty in our estimate of by using many informa- 
tive genomic regions. We first select a set of genomic 



regions where + and - methylation differs. We then find 
the optimal value of Xi to explain the observed methylation 
for sample i in these locations, as a function of our esti- 
mated means and J^i, subject to the constraint that TTf is 
between 0 and 1. This procedure closely follows that pre- 
sented by Houseman et al. [19]. 

Selection of the genomic locations can be based on a 
variety of factors, such as the range of observed methyla- 
tion at these locations, the variance of the methylation 
estimates, and the length of the region of differential 
methylation. For our estimation, we chose the 500 geno- 
mic regions which were the strongest + vs. - DMR candi- 
dates in the brain region of interest in relation to the 
amount of methylation difference and the length of the 
region showing the methylation difference. We found 
that our results were quite robust to the number of 
regions selected, with 500 performing well. 

To investigate whether it is absolutely necessary to have 
sorted data from a given brain region to estimate cell pro- 
portions in unsorted data from that region, we identified a 
set of 'universal' genomic regions. These universal regions 
had different NeuN+ and NeuN- methylation signals 
within a brain region, but showed consistent NeuN+ and 
NeuN- methylation levels across the three brain regions 
for which we had data (DLPFC, HF, and STG). Many of 
these + vs. - DMR candidates had consistent NeuN+ and 
NeuN- levels across brain regions, with 14% to 17% of the 
probes in the + vs. - DMRs belonging to genomic regions 
of consistent signal. We estimated the means in these 
regions of consistent signal using sorted data from DLPFC 
alone, and then performed cell fraction estimation in the 
unsorted samples from DLPFC, HF, and STG using these 
mean values. Since we do not know the true cell fractions 
in these unsorted samples, we used the estimates we had 
obtained for each brain region using the region-specific 
DMRs and mean values, as described above, as our gold 
standard. 

All analysis was implemented in R (R Core Team, R: 
A Language and Environment for Statistical Computing. 
R Foundation for Statistical Computing: Vienna, Austria, 
2012; [33]). The data discussed in this publication have 
been deposited in NCBI's Gene Expression Omnibus 
and are accessible through GEO series accession num- 
ber GSE48610. 

Effect of inaccurate mixture estimates 

As previously described, failure to account for differences 
in cell-mixtures in our samples can lead to biased esti- 
mates of brain-region differences under the null hypoth- 
esis of no brain region difference. However, inaccurate 
mixture estimates can also lead to bias. For example, 
consider the methylation signal in sample i 

E{Yi) = p 0 + frm + p 2 Xi(i - Tti) + foXm 
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Now suppose we have an inaccurate estimate of fti, 
called ft*, where ft* = n\ + y\. Using this inaccurate esti- 
mate gives us the following contribution to our regres- 
sion formulation from sample i: 

ft + ftjr,* + ftX,(l - JTj*) + ftX,(ir;* = ft +ft (*i + Yi) + ftXi(l - 7Tj + Yi) + ftXi(iri + yi) 

= Po + Pm + Pi*i + + (ft ~ Pi)XiYi 

= p 0 + p iYl + - 7T,-) + PiXmt + (ft - p 2 )Hvi - wo - (i - >h)*i) 
= p a +PiY i +Pim+PiHi-jT,)+p 3 x^ i + {p 3 -p 2 ) m x i {i-^) + ^3-P2Ki-m)Xm 

= ft + iSiK + ftwi + (ft + (ft - h)m)Xi(l - m) + (ft + (ft - ft)(l - <iW,n, 

where r/j is between 0 and 1, and the third line follows 
from the fact that y ; must be between —ft\ and 1 — ft\ to 
ensure that ft* is between 0 and 1. We can see that the 
coefficient of Xj(l — JTj) is no longer measuring just the 
quantity we are interested in (the difference between 
NeuN+ methylation in regions H and D), but it also has 
an additional factor related to the size of the estimation 
error, and similarly for the coefficient of X\n x . 

CHARM DNA methylation analysis 

Genomic DNA was isolated from brains using the Master- 
pure kit from Epicentre, according to the manufacturer's 
protocol. For genome-wide DNA methylation assessment, 
1 ug of genomic DNA from each sample was digested, 
fractionated, labeled, and hybridized to a CHARM array as 
described [34,35] using a custom Nimblegen 2.1 million 
feature array assaying 5,114,655 CpG sites. We used the 
Bioconductor package 'charm' for sample preprocessing 
along with the package 'bumphunter' for DMR identifica- 
tion and permutation computation. 

Human postmortem brain samples 

Fluorescence-activated cell sorting was performed on fro- 
zen postmortem dorsolateral prefrontal cortex (« = 4), and 
hippocampal formation (n = 4) and superior temporal 
gyrus {n = 3) from individuals not affected with neurologi- 
cal or psychiatric disease. To validate the statistical model, 
we used nine additional healthy samples from the dorso- 
lateral prefrontal cortex. These samples underwent nuclei 
extraction and sorting as described below. The model was 
applied to additional unsorted control samples (19 samples 
from DLPFC, 13 samples from HF, 31 samples from STG) 
to deconvolve NeuN+ and NeuN- methylation signatures. 
All samples were obtained from the bank of the Center for 
Neurodegenerative Disease Research (CNDR) in the 
Department of Pathology and Laboratory Medicine at the 
University of Pennsylvania (directed by Dr John Q Troja- 
nowski, see Additional File 1, Tables S2-4 for demographic 
information). 



Nuclei extraction, NeuN labeling, and sorting 

Total nuclei were extracted via sucrose gradient centrifu- 
gation as previously described [25] . A total of 250 mg of 
frozen tissue per sample was homogenized in 5 mL of 
lysis buffer (0.32M sucrose, 10 mM Tris pH 8.0, 5 mM 
CaCl 2 , 3 mM Mg acetate, 1 mM DTT, 0.1 mM EDTA, 
0.1% Triton X-100) by douncing 50 times in a 40-mL 
dounce homogenizer. Lysate was transferred to a 15 mL 
ultracentrifugation tube and 9 mL of sucrose solution 
(1.8 M sucrose, 10 mM Tris pH 8.0, 3 mM Mg acetate, 
1 mM DTT) was pipetted to the bottom of the tube. The 
solution was then centrifuged at 27,000 rpm for 2.5 h at 
4C (Beckman, L8-80 M; SW28.1 rotor). After centrifuga- 
tion, the supernatant was removed by aspiration and the 
nuclei pellet was resuspended in 500 uL of PBS. 

The nuclei were incubated in a staining mix (0.71% 
normal goat serum, 0.036% BSA, 1:1200 anti-NeuN 
NeuN (Millipore, MAB377), 1:1400 Alexa647 goat anti- 
mouse secondary antibody (Invitrogen, 21236) for 45 min 
by rotating in the dark at 4°C. Unstained nuclei and 
nuclei stained with only secondary antibody served as 
negative controls. The fluorescent nuclei were run 
through a FACS machine with proper gate settings. A 
small portion of the NeuN + and NeuN" nuclei were re- 
run on the FACS machine to validate the purity. Immu- 
nonegative (NeuN ) nuclei were collected in parallel. To 
pellet the sorted nuclei, 2 mL of sucrose solution, 50 uL 
of 1 M CaCl 2 , and 30 uL of Mg acetate were added to 10 
mL of nuclei in PBS, incubated on ice for 15 min, then 
centrifuged at 3,000 rpm for 20 min. The nuclei pellet 
was resuspended in 10 mM Tris (pH 7.5), 4 mM MgCl 2 , 
and 1 mM CaCl 2 . Fluorescent images were taken on a 
Zeiss Axio Observer. Zl microscope with a Plan-Apoc- 
hromat 100x/1.40 oil-immersion objective lens. Images 
were generated using an Axiocam MR3 microscope cam- 
era and Axiovision software (AxioVs40, version 4.8.2.0, 
Carl Zeiss, Inc). Images were processed using ImageJ. 

Additional material 



Additional file 1: Supplementary Information. A PDF file containing 
Figures SI -4 and Tables SI -4. 
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